A Probabilistic Chunker

نویسندگان

  • Kuang-hua Chen
  • Hsin-Hsi Chen
چکیده

This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrunk version of Brown Corpus, underlying bi-gram language model. The experiment is evaluated by outside test and inside test. The preliminary results show the chunker has more than 98% chunk correct rate and 94% sentence correct rate in outside test, and 99% chunk correct rate and 97% sentence correct rate in inside test. The simple but effective chunker design has shown to be promising and can be extended to complete parsing and many applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of a Partially Bracketed Corpus with Part-of-Speech Information Only

Resea/ch based on a treebank is active for many natural language applications. However, the work to build a large scale treebank is laborious and tedious. This paper proposes a probabilistic chunker to help the development of a partially bracketed corpus. The chunker partitions the part-of-speech sequence into segments called chunks. Rather than using a treebank as our training corpus, a corpus...

متن کامل

Robust German Noun Chunking With a Probabilistic Context-Free Grammar

We present a noun chunker for German which is based on a head-lexicalised probabilistic contextfree grammar. A manually developed grammar was semi-automatically extended with robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by a probabilistic context-free parser. For extracting noun chunks, the parser generates all ...

متن کامل

Chunker and Shallow Parser for Free Word Order Languages: An Approach based on Valency Theory and Feature Structures

Free word order languages have relatively unrestricted local word group or phrase structures that make the problem of chunking quite challenging. On the other hand, a robust chunker can drastically reduce the complexity of a parser that follows. We present here a computational framework for chunking of free word order languages based on a generalization of the valency theory. Every word has cer...

متن کامل

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and its Automatic Evaluation

phrases. The partial parser is motivated by an intuition (Abney, 1991): To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure rule-based noun phrases grammar to tackle this problem. In this paper, we apply a probabilistic chunker to deciding the implicit bo...

متن کامل

Acquisition of Subcategorization Frames from Large Scale Texts

Subcategorization frames are useful for many applications. Due to many ambiguities, to extract them is not straightforward. In this paper, a probabilistic chunker is used to determine the plausible phrase boundaries and a finite state mechanism, SUBCAT-TRACTOR, is proposed to extract 23 subcategorization frames. In order to get rid of the problems introduced by compound nouns, a noun-phrase ext...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993